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(54) Voice storage and retrieval system 

(57) A digital voice data storage and retrieval sys- 
tem using a low bit rate encoder which provides en- 
hanced speech signal quality while also reducing mem- 
ory size requirements. The system comprises a voice 
coder/decoder which preferably includes a digital signal 
processor (DSP) and also preferably includes a local 
memory. During encoding of the voice data, the voice 
coder/decoder receives voice input waveforms and gen- 
erates a parametric representation of the voice data. A 
storage memory is coupled to the voice coder/decoder 
for storing the parametric data. During decoding of the 
voice data, the voice coder/decoder receives the para- 
metric data from the storage memory and reproduces 
the voice waveforms. According to the invention, an in- 



terframe smoothing method is performed on the para- 
metric data after encoding of all of the speech data has 
completed and the parametric data has been stored in 
the storage memory. The interframe smoothing is per- 
formed either in the background after the coding proc- 
ess has completed or : n real time during the decoding 
process immediately prior to converting the parametric 
data back to signal waveforms. Since all of the voice 
input data has already been converted to parametric da- 
ta and stored in memory, parametric data from a virtually 
unlimited number of prior and successive frames is 
available for use by the smoothing algorithm. Therefore, 
the present invention provides more accurate smooth- 
ing and provides enhanced speech signal quality over 
prior systems. 
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Description 

The present invention relates generally to voice 
storage and retrieval systems such as a system and 
method for performing parameter smoothing operations 
after the encoding process has completed to allow ac- 
cess to parameters in a greater number of frames and 
thus provide enhanced speech quality with reduced 
memory requirements 

Digital storage and communication of voice or 
speech signals has become increasingly prevalent in 
modern society. Digital storage of speech signals com- 
prises generating a digital representation of the speech 
signals and then storing those digital representations in 
memory. As shown in Figure 1 a digital representation 
of speech signals can generally be either a waveform 
representation or a parametric representation. A wave- 
form representation of speech signals comprises pre- 
serving the "waveshape" of the analog speech signal 
through a sampling and quantization process. A para- 
metric representation of speech signals involves repre- 
senting the speech signal as a plurality of parameters 
which affect the output of a model for speech production. 
A parametric representation of speech signals is accom- 
plished by first generating a digital waveform represen- 
tation using speech signal sampling and quantization 
and then further processing the digital waveform to ob- 
tain parameters of the model for speech production. The 
parameters of this model are generally classified as ei- 
ther excitation parameters, which are related to the 
source of the speech sounds, or vocal tract response 
parameters, which are related to the individual speech 
sounds. 

Figure 2 illustrates a comparison of the waveform 
and parametric representations of speech signals ac- 
cording to the data transfer rate required. As shown, par- 
ametric representations of speech signals require a low- 
er data rate, or number of bits per second, than wave- 
form representations. A waveform representation re- 
quires from 15,000 to 200,000 bits per second to repre- 
sent and/or transfer typical speech, depending on the 
type of quantization and modulation used. A parametric 
representation requires a significantly lower number of 
bits per second, generally from 500 to 15,000 bits per 
second. In general, a parametric representation is a 
form of speech signal compression which uses a priori 
knowledge of the characteristics of the speech signal in 
the form of a speech production model. A parametric 
representation represents speech signals in the form of 
a plurality of parameters which affect the output of the 
speech production model, wherein the speech produc- 
tion model is a model based on human speech produc- 
tion anatomy. 

Speech sounds can generally be classified into 
three distinct classes according to their mode of excita- 
tion Voiced sounds are sounds produced by vibration or 
oscillation of the human vocal cords, thereby producing 
quasi-periodic pulses of air which excite the vocal tract 



Unvoiced sounds are generated by forming a constric- 
tion at some point in the vocal tract, typically near the 
end of the vocal tract at the mouth, and forcing air 
through the constriction at a sufficient velocity to pro- 
5 duce turbulence. This creates a broad spectrum noise 
source which excites the vocal tract. Plosive sounds re- 
sult from creating pressure behind a closure in the vocal 
tract, typically at the mouth, and then abruptly releasing 
the air. 

io A speech production model can generally be parti- 
tioned into three phases comprising vibration or sound 
generation within the glottal system, propagation of the 
vibrations or sound through the vocal tract, and radiation 
of the sound at the mouth and to a lesser extent through 

*s the nose. Figure 3 illustrates a simplified model of 
speech production which includes an excitation gener- 
ator for sound excitation or generation and a time vary- 
ing linear system which models propagation of sound 
through the vocal tract and radiation of the sound at the 

20 mouth. Therefore, this model separates the excitation 
features of sound production from the vocal tract and 
radiation features. The excitation generator creates a 
signal comprised of either a train of glottal pulses or ran- 
domly varying noise. The train of glottal pulses models 

2S voiced sounds, and the randomly varying noise models 
unvoiced sounds. The linear time-varying system mod- 
els the various effect on the sound within the vocal tract. 
This speech production model receives a plurality of pa- 
rameters which affect operation of the excitation gener- 

30 ator and the time-varying linear system to compute an 
output speech waveform corresponding to the received 
parameters. 

Referring now to Figure 4, a more detailed speech 
production model is shown. As shown, this model in- 

35 eludes an impulse train generator for generating an im- 
pulse train corresponding to voiced sounds and a ran- 
dom noise generator for generating random noise cor- 
responding to unvoiced sounds. One parameter in the 
speech production model is the pitch period, which is 

40 supplied to the impulse train generator to generate the 
proper pitch or frequency of the signals in the impulse 
train. The impulse train is provided to a glottal pulse 
model block which models the glottal system. The out- 
put from the glottal pulse model block is multiplied by an 

45 amplitude parameter and provided through a voiced/un- 
voiced switch to a vocal tract model block. The random 
noise output from the random noise generator is multi- 
plied by an amplitude parameter and is provided through 
the voiced/unvoiced switch to the vocal tract model 

so block. The voiced/unvoiced switch is controlled by a pa- 
rameter which directs the speech production model to 
switch between voiced and unvoiced excitation gener- 
ators, i.e., the impulse train generator and the random 
noise generator, to model the changing mode of excita- 

55 tion for voiced and unvoiced sounds. 

The vocal tract model block generally relates the 
volume velocity of the speech signals at the source to 
the volume velocity of the speech signals at the tips. The 
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vocal tract model block receives various vocal tract pa- 
rameters which represent how speech signals are af- 
fected within the vocal tract. These parameters include 
various resonant and unresonant frequencies, referred 
to as formants, of the speech which correspond to poles 
or zeroes of the transfer function V(z). The output of the 
vocal tract model block is provided to a radiation model 
which models the effect of pressure at the lips on the 
speech signals. Therefore, Figure 4 illustrates a general 
discrete time model for speech production. The various 
parameters, including pitch, voice/unvoice, amplitude or 
gain, and the vocal tract parameters affect the operation 
of the speech production model to produce or recreate 
the appropriate speech waveforms. 

Referring now to Figure 5, in some cases it is desir- 
able to combine the glottal pulse, radiation and vocal 
tract model blocks into a single transfer function. This 
single transfer function is represented in Figure 5 by the 
time-varying digital filter block. As shown, an impulse 
train generator and random noise generator each pro- 
vide outputs to a voiced/unvoiced switch. The output 
from the switch is provided to a gain multiplier which in 
turn provides an output to the time-varying digital filter. 
The time-varying digital filter performs the operations of 
the glottal puise model block, vocal tract model block 
and radiation model block shown in Figure 4. 

The choice of speech signal representation typically 
depends on the speech application involved. Various 
types of digital speech applications include digital stor- 
age and retrieval of speech data, digital transmission of 
speech signals, speech synthesis, speaker verification 
and identification, speech recognition, and enhance- 
ment of signal quality, among others. Most speech com- 
munication and recognition applications require real 
time encoding and transmission of speech signals. 
However, certain digital speech applications, i.e., those 
which involve digital storage and retrieval of speech da- 
ta, do not require real time transmission. For example, 
the storage and retrieval of digital speech signals in an- 
swering machine, voice mail, and digital recorder appli- 
cations do not require real time transmission of speech 
signals. 

Background on voice encoding and decoding meth- 
ods which use parametric representations of speech 
signals is deemed appropriate. A speech storage sys- 
tem first receives input voice wavelorms and converts 
the waveforms to digital format. This involves sampling 
and quantizing the signal waveform into digital form. The 
voice encoder within the system then partitions the dig- 
ital voice data into respective frames and analyzes the 
voice data on a frame-by-frame basis. The voice encod- 
er generates a plurality of parameters which describe 
each particular frame of the digital voice data. 

After parameters have been calculated for a plural- 
ity of frames, a smoothing method is typically applied to 
the parameters in each frame to smooth out discontinu- 
ities and thus eliminate errors in the parameter estima- 
tion process. In general, many parameters of a speech 



signal waveform, pitch for example, vary relatively slow- 
ly in time. Therefore, a parameter that varies substan- 
tially from one frame to the next may constitute an error 
in the parameter estimation method. The smoothing 
s method operates by examining like parameters in re- 
spective neighboring frames to detect discontinuities. In 
other words, the smoothing algorithm compares the val- 
ue of the respective parameter being examined with like 
parameters in one or more prior frames and one or more 

10 subsequent frames to determine whether the value of 
the respective parameter varies substantially from the 
values of the same or like parameter in neighboring 
frames. If one parameter significantly varies from neigh- 
boring like parameters in prior and subsequent frames, 

75 the smoothing method smoothes out the discontinuity, 
i.e., replaces the parameter value with a neighboring 
value. Therefore, smoothing is applied to smooth 
changes among parameters between consecutive 
frames and thus reduce errors in the parameter estima- 
te tion process. Smoothing may involve examining related 
parameters in context in order to more accurately esti- 
mate the parameters. For example, the voicing and pitch 
parameters are analyzed to ensure that a valid pitch pa- 
rameter is obtained only if the speech waveform is 

25 voiced, and vice versa. 

In prior art systems, smoothing is performed in real 
time on a set of parameters during the encoding process 
after the set of parameters has been generated and prior 
to storing these parameters in the storage memory. 

30 However, in most applications the encoding of speech 
signals into a digital parametric representation must be 
performed in real time with minimal delay. In fact, most 
speech communication standards severely limit the 
amount of delay that can be imposed in a voice trans- 

35 mission. This requirement of real time encoding of 
speech data limits the number of frames which can be 
used in the smoothing process. In addition, maintaining 
a plurality of prior and subsequent frames in the memory 
used by the encoder requires increased memory size in 

40 the encoder and thus increases the cost of the system. 
As mentioned above, certain digital speech appli- 
cations, such as digital voice storage and retrieval sys- 
tems, do not require real time transmission of speech 
data. Digital speech storage and retrieval applications 

45 generally require a low bit rate for the necessary voice 
coding and decoding in order to compress the speech 
data as much as possible. However, it is also desirable 
to provide quality voice reproduction at this low bit rate. 
It is also generally desirable to reduce the memory re- 

50 quirements for digital encoding, storage, and decoding 
in order to reduce system cost. 

We will describe an improved system and method 
for digital voice storage and retrieval is desired which 
provides enhanced speech signal quality in low bit rate 

ss speech encoders white also reducing memory require- 
ments. 
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Summary of the Invention 

The present invention comprises a digital voice data 
storage and retrieval system, preferably using a low bit 
rate encoder, which provides enhanced speech signal s 
quality while also reducing memory size requirements. 
The system comprises a voice coder/decoder which 
preferably includes a digital signal processor (DSP) and 
also preferably includes a local memory. During encod- 
ing of the voice data, the voice coder/decoder receives 10 
voice input waveforms and generates a parametric rep- 
resentation of the voice data A storage memory is cou- 
pled to the voice coder/decoder for storing the paramet- 
ric data. During decoding of the voice data, the voice 
coder/decoder receives the parametric data from the is 
storage memory and reproduces the voice waveforms. 
A CPU is preferably coupled to the voice coder/decoder 
for controlling the operations of the voice coder/decoder. 

During the coding process, voice input waveforms 
are received and converted into digital data, i.e., the 20 
voice input waveforms are sampled and quantized to 
produce digital voice data. The digital voice data is then 
partitioned into a plurality of respective frames, and cod- 
ing is performed on respective frames to generate a par- 
ametric representation of the data, i.e., to generate a 2s 
plurality of parameters which describe the respective 
frames of voice data, fn one embodiment, smoothing is 
not performed during the encoding process, but rather 
the unsmoothed or "raw" parameter data is stored for 
the respective frames. In another embodiment, for cer- 30 
tain parameters a plurality of parameter values are es- 
timated for each frame, and intraf rame smoothing is per- 
formed to generate a single parameter for the frame. 
The intraf rame smoothing process performed during en- 
coding does not require parametric data in prior or sue- 35 
cessive frames for comparison and thus requires little 
or no additional memory. 

According to the invention, an interfiame smoothing 
method is performed on the parametric data after en- 
coding of all of the speech data has completed and the 40 
parametric data has been stored in the storage memory. 
The interframe smoothing is performed either in the 
background after the coding process has completed or 
in real time during the decoding process immediately 
prior to converting the parametric data back to signal 45 
waveforms. Since all of the voice input data has already 
been converted to parametric data and stored in mem- 
ory, parametric data from a virtually unlimited number of 
prior and successive frames is available for use by the 
smoothing algorithm. Thus, the smoothing method pref- so 
erably utilizes the parameter values of a plurality of prior 
and subsequent frames in smoothing parameters in 
each respective frame. Therefore, the present invention 
provides more accurate smoothing and provides en- 
hanced speech signal quality over prior systems. ss 

As discussed in the background section, prior art 
systems perform smoothing in real time during the en- 
coding process and are generally limited to examining 



like parameter values in a single prior and successive 
frame due to the necessity of real time voice encoding. 
However, in the present invention the smoothing meth- 
od is performed after the encoding process has com- 
pleted and the parametric data has been stored. Since 
all of the parametric data is readily available, the 
smoothing method examines parametric data from a far 
greater number of prior and successive frames. There- 
fore, the system can more easily detect transitions and/ 
or correct discontinuities that occur in the speech signal 
data. This provides enhanced speech signal quality over 
prior art methods. Also, since interframe smoothing is 
not performed during encoding, extra memory is not re- 
quired for a successive or look-ahead frame during the 
encoding process. Therefore, the present invention has 
reduced memory requirements over prior designs. 

In the preferred embodiment, during the smoothing 
process the system of the present invention stores par- 
ametric data in respective buffers in the DSP local mem- 
ory, preferably circular buffers, where each circular buff- 
er stores like parameters for a plurality of consecutive 
frames. In other words, parameter values of a first pa- 
rameter type from a plurality of consecutive frames are 
stored in afirst circular buffer, parameter values of a sec- 
ond parameter type from a plurality of consecutive 
frames are stored in a second circular buffer, and so on. 
Therefore, during smoothing the DSP local memory 
comprises a plurality of circular buffers with each circu- 
lar buffer containing parameters of the same type for a 
plurality of consecutive frames. New parameter values 
are continually read into each circular buffer to maintain 
parameter data for respective prior and successive 
frames relative to the frame containi.ig the parameter 
being examined. 

In one embodiment, parameter values from seven- 
teen consecutive frames are stored in each circular buff- 
er. These seventeen frames correspond to the eight pri- 
or and eight successive frames relative to the frame con- 
taining the parameter being examined. In an alternate 
embodiment, the circular buffers vary in size for respec- 
tive parameters, and thus a different number of like pa- 
rameters are examined during the smoothing process 
for different types of parameters. In addition, in one em- 
bodiment, if the DSP decides that an even greater 
number of parameters from additional prior and subse- 
quent frames are necessary to reach a decision in the 
smoothing process, the DSP reads these additional pa- 
rameters from the storage memory to perform more in- 
telligent smoothing of that respective parameter. In yet 
another embodiment, only the respective parameters 
deemed to be the most important parameters and/or the 
most likely to be estimated improperly are stored in the 
memory local to the digital processor in order to reduce 
local memory requirements and simplify the smoothing 
process. The parameters not stored in the local memory 
are read from the random access storage memory as 
needed. 

Therefore, a digital voice storage and retrieval sys- 
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tem according to the present invention provides en- 
hanced speech signal quality. Particular embodiments 
are shown and described. 

Brief Description of the Drawings 

A better understanding of the present invention can 
be obtained when the following detailed description of 
the preferred embodiment is considered in conjunction 
with the following drawings, in which: 

Figure 1 illustrates waveform representation and 
parametric representation methods used for repre- 
senting speech signals; 

Figure 2 illustrates a range of bit rates for the 
speech representations illustrated in Figure 1; 
Figure 3 illustrates a basic model for speech pro- 
duction; 

Figure 4 illustrates a generalized model for speech 
production; 

Figure 5 illustrates a model for speech production 
which includes a single time-varying digital filter; 
Figure 6 is a block diagram of a speech storage sys- 
tem according to one embodiment of the present 
invention; 

Figure 7 is a block diagram of a speech storage sys- 
tem according to a second embodiment of the 
present invention; 

Figure 8 is a flowchart diagram illustrating operation 
of speech signal encoding according to one embod- 
iment of the invention; 

Figure 9 illustrates speech signal waveforms parti- 
tioned into partially overlapping twenty millisecond 
samples; 

Figure 10 is a flowchart diagram illustrating an in- 
terframe smoothing process performed in the back- 
ground after encoding of the digital voice data has 
completed according to one embodiment of the in- 
vention; 

Figure 11 is a flowchart diagram illustrating decod- 
ing of encoded parameters to generate speech 
waveform signals, wherein the decoding process 
includes an interfiame smoothing process accord- 
ing to one embodiment of the invention; 
Figure 1 2 illustrates parameter memory storage ac- 
cording to a multiple access, normal ordering meth- 
od; and 

Figure 1 3 illustrates parameter memory storage ac- 
cording to a single access, demand ordering meth- 
od. 

Voice Storage and Retrieval System 

Referring now to Figure 6, a block diagram illustrat- 
ing a voice storage and retrieval system according to 
one embodiment of the invention is shown. The voice 
storage and retrieval system shown in Figure 6 can be 
used in various applications, including digital answering 



machines, digital voice mail, digital voice recorders, and 
other applications which require storage and retrieval of 
digital voice data. In the preferred embodiment, the 
voice storage and retrieval system is used in a digital 
5 answering machine. It is also noted that the present in- 
vention may be used in other systems which involve the 
storage and retrieval of parametric data, including video 
storage and retrieval systems, among others. 

As shown in Figure 6, the voice storage and retriev- 
10 al system preferably includes a dedicated voice coder/ 
decoder 102. The voice coder/decoder 102 includes a 
digital signal processor (DSP) 1 04 and local DSP mem- 
ory 106. The local memory 106 serves as an analysis 
memory used by the DSP 104 in performing voice cod- 
15 ing and decoding functions, i.e., voice compression and 
decompression, as well as parameter data smoothing. 
The local memory 106 operates at a speed equivalent 
to the DSP 104 and thus has a relatively fast access 
time. Since the local memory 106 is required to have a 
20 fast access time, the memory 106 is relatively costly. 
One benefit of the present invention is that the invention 
has reduced local memory requirements while also pro- 
viding enhanced speech quality. In the preferred em- 
bodiment, 2 Kbytes of local memory 106 are used. 
25 The voice coder/decoder 1 02 is coupled to a param- 
eter storage memory 112. The storage memory 112 is 
used for storing coded voice parameters corresponding 
to the received voice input signal. In one embodiment, 
the storage memory 112 is preferably low cost (slow) 
30 dynamic random access memory (DRAM). However, it 
is noted that the storage memory 1 1 2 may comprise oth- 
er storage media, such as a magnetic disk, flash mem- 
ory, or other suitable storage media. A CPU 120 is cou- 
pled to the voice coder/decoder 102 and controls oper- 
as ations of the voice coder/decoder 102 : including opera- 
tions of the DSP 104 and the DSP local memory 106 
within the voice coder/decoder 102. The CPU 120 also 
performs memory management functions for the voice 
coder/decoder 102 and the storage memory 112. 

40 

Alternate Embodiment 

Referring now to Figure 7, an alternate embodiment 
of the voice storage and retrieval system is shown. El- 

45 ements in Figure 7 which correspond to elements in Fig- 
ure 6 have the same reference numerals for conven- 
ience. As shown, the voice coder/decoder 102 couples 
to the CPU 1 20 through a serial link 1 30. The CPU 1 20 
in turn couples to the parameter storage memory 112 

50 as shown. The serial link 1 30 may comprise a dumb se- 
rial bus which is only capable of providing data from the 
storage memory 1 1 2 in the order that the data is stored 
within the storage memory 112. Alternatively, the serial 
link 130 may be a demand serial link, where the DSP 

55 104 controls the demand for parameters in the storage 
memory 1 1 2 and randomly accesses desired parame- 
ters in the storage memory 112 regardless of how the 
parameters are stored. The embodiment of Figure 7 can 



4SDCCID: <EP 0731348A2J_> 



5 



9 



EP 0 731 348 A2 



10 



also more closely resemble the embodiment of Figure 
6 whereby the voice coder/decoder 1 02 couples directly 
to the storage memory 11 2 via the serial link 1 30. In ad- 
dition, a higher bandwidth bus, such as an 8-bit or 1 6-bit 
bus, may be coupled between the voice coder/decoder 
102 and the CPU 120. 

Encoding Voice Data 

Referring now to Figure 8, a flowchart diagram illus- 
trating operation of the system of Figure 6 encoding 
voice or speech signals into parametric data is shown. 
In step 202 the voice coder/decoder 102 receives voice 
input waveforms, which are analog waveforms corre- 
sponding to speech. These waveforms will typically re- 
semble the waveforms shown in Figure 9. 

In step 204 the DSP 1 04 samples and quantizes the 
input waveforms to produce digital voice data. The DSP 
104 samples the input waveform according to a desired 
sampling rate. In one embodiment, the speech signal 
waveform is sampled at a rate of 8 kHz or 8000 samples 
per second. In an alternate embodiment, the sampling 
rate is twice the Nyquist sampling rate. Other sampling 
rates may be used, as desired. After sampling, the 
speech signal waveform is then quantized into digital 
values using a desired quantization method. In step 206 
the DSP 1 04 stores the digital voice data or digital wave- 
form values in the local memory 106 for analysis by the 
DSP 104. 

While additional voice input data is being received, 
sampled, quantized, and stored in the local memory 1 06 
in steps 202-206, the following steps are performed. In 
step 208 the DSP 1 04 performs encoding on a grouping 
of frames of the digital voice data to derive a set of pa- 
rameters which describe the voice content of the re- 
spective frames being examined. In the preferred em- 
bodiment, linear predictive coding is performed on 
groupings of four frames. However, it is noted that other 
types of coding methods may be used, as desired. Also, 
a greater or lesser number of frames may be encoded 
at a time, as desired. For more information on digital 
processing and coding of speech signals, please see 
Rabiner and Schafer, Digital Processing of Speech Sig- 
nals , Prentice Hall, 1978, which is hereby incorporated 
by reference in its entirety. 

The DSP 104 preferably examines the speech sig- 
nal waveform in 20 ms frames for analysis and coding 
into respective parameters. With a sampling rate of 8 
kHz, each 20 ms frame comprises 1 60 samples of data. 
The DSP 104 preferably examines four 20 ms frames 
at a time where each frame overlaps neighboring frames 
by five samples on either side, as shown in Figure 9. 
The local memory 106 is preferably sufficiently large to 
store up to six full frames of digital voice data. This al- 
lows the DSP 1 04 to examine a grouping of four frames 
and generate parameters for this grouping of four 
frames while up to an additional two frames are re- 
ceived, sampled, quantized and stored in the local mem- 



ory 106. The local memory 106 is preferably configured 
as one or more buffers, preferably circular buffers, 
where newly received digital voice data overwrites voice 
data from which parameters have already been gene r- 
5 ated and stored in the storage memory 112. It is noted 
that the local memory 106 may be any of various types 
of memory, including registers, linear buffers, or circular 
buffers, among others. 

In step 208 the DSP 104 develops a set of param- 
io eters of different types for each 20 ms frame in the 
grouping of four frames. The DSP 104 also generates 
one or more parameters which span the entire four 
frames. In addition, for certain parameters, the DSP 104 
partitions the respective frames into two or more sub- 
15 frames and generates corresponding two or more pa- 
rameters of the same type for each frame. Jn the pre- 
ferred embodiment, the DSP 104 generates ten linear 
predictive coding (Ipc) parameters for every four frames. 
The DSP 104 also generates additional parameters for 
20 each frame which represent the characteristics of the 
speech signal, including a pitch parameter, a voice/un- 
voice parameter, a gain parameter, a magnitude param- 
eter, and a multiband excitation parameter. The DSP 
104 further generates a set of spectral content param- 
os eters computed lor each frame which are quantized into 
one value across a grouping of frames, preferably three 
frames. 

Once these parameters have been generated in 
step 208, in step 210 the DSP 104 optionally performs 

30 intraframe smoothing on selected parameters. In an em- 
bodiment where intraframe smoothing is performed, a 
plurality of parameters of the same type are generated 
for each frame in step 208. Intraframe smoothing is ap- 
plied in step 21 0 to reduce these plurality of parameters 

35 of the same type to a single parameter of that type. For 
example, a plurality of different pitch parameter values 
are calculated at different points in a frame for each 
frame in step 208, and in step 210 intraframe smoothing 
is performed to reduce these twenty pitch parameter val- 

40 ues to a single pitch value representative of the entire 
frame. Intraframe smoothing preferably involves select- 
ing a mean or median value. Alternatively intraframe 
smoothing involves developing a waveform based on 
the plurality of parameter values in the frame and then 

45 using this developed waveform to index into a listing of 
parameter values based on this waveform. Intraframe 
smoothing is generally performed on those parameters 
which are more likely to vary within a frame. However, 
as noted above, the intraframe smoothing performed in 

50 step 210 is an optional step which may or may not be 
performed, as desired. 

Once the coding has been performed on the respec- 
tive grouping of frames to produce parameters in step 
208, and any desired intraframe smoothing has been 

55 performed on selected parameters in step 2 1 0, the DSP 
104 stores this packet of parameters in the storage 
memory 112 in step 212. Once parametric data corre- 
sponding to a respective grouping of frames has been 
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generated and stored in the storage memory 112, newly 
received data eventually overwrites this data in the cir- 
cular buffer in step 206, and thus the digital voice data 
for this grouping of frames is removed from the local 
memory 106 and hence "thrown away." 

If more speech waveform data is being received by 
the voice coder/decoder 1 02 in step 214, then operation 
returns to step 202, and steps 202 - 214 are repeated. 
Thus, once a set of parameters has been generated for 
a grouping of frames and stored in the storage memory 
1 1 2, the DSP 1 04 examines the next grouping of frames 
stored in local memory 106 and generates a plurality of 
parameters for this grouping, and so on. If no more voice 
data is determined to have been received in step 214, 
and thus no more digital voice data is stored in the local 
memory 106, then operation completes. 

Voice coding is performed in real time as the voice 
signal is received by the voice coder/decoder 1 02. In the 
preferred embodiment, a system according to the 
present invention compresses the voice data to approx- 
imately 2900 bits per second (bps) of speech, which is 
approximately one-third of a bit per sample. More or less 
compression may be applied to the voice data, as de- 
sired. 

It is noted that prior art systems perform an addi- 
tional interframe smoothing process on the parameter 
data generated by the DSP 1 04 in real time prior to stor- 
ing the parameter data in the storage memory 112. As 
discussed in the background section, when interframe 
smoothing is implemented in the encoding process, the 
system is only able to examine the same or like param- 
eters in one subsequent and one prior frame for each 
parameter being examined. However, it would generally 
be desirable to examine like parameters in a plurality of 
subsequent and prior frames to perform more accurate 
smoothing. This is generally not possible during real 
time encoding because significant delays would be add- 
ed to the voice coding process. This is unacceptable for 
most voice data transmission standards. In addition, in 
systems which perform interframe smoothing during the 
encoding process, the voice coder/decoder 102 is re- 
quired to have a larger local memory 1 06 for storing ad- 
ditional frames of voice parameter data. In cost sensitive 
systems, this additional memory is undesirable. 

In applications that do not require real time trans- 
mission of voice data, it has been determined that is un- 
desirable and unnecessary to perform an interframe 
smoothing process in real time during the voice coding 
process. Rather, the system and method of the present 
invention performs interframe smoothing operations ei- 
ther in the background after voice parameter data has 
been coded and stored in the storage memory 112, or 
interframe smoothing operations are performed in real 
time during the voice decoding process. After the coding 
process has completed, i.e., after all of the voice wave- 
forms have been received, converted into parametric 
data, and stored in the storage memory 112, all of the 
parametric data is readily available in the storage mem- 
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ory 112 for use during the smoothing process. There- 
fore, parametric data from an unlimited number of prior 
and subsequent frames is available for use by the 
smoothing method. Thus, more accurate smoothing can 

s be performed on each parameter since a greater 
number of like parameters in prior and subsequent 
frames are available. In addition, a system according to 
the present invention requires reduced local memory 
since parametric data for a look-ahead frame or subse- 

io quent frame is no longer required to be stored in the local 
memory 106 during the encoding process. 

Smoothing Performed in Background 

*s Figure 10 is a flowchart diagram illustrating smooth- 
ing operations being performed in the background after 
encoding of the voice data has completed and all of the 
parametric data has been stored in the storage memory 
112 according to one embodiment of the present inven- 

20 tion. As mentioned above, in applications which do not 
require real time voice data transmission, smoothing op- 
erations can be performed after the voice data has been 
coded into parametric data and prior to retrieval of the 
parametric data, i.e., in the background. Examples of 

2S applications where smoothing operations can be per- 
formed in the background include digital voice answer- 
ing machines, digital tape recorders and other voice 
storage and retrieval systems. For example, in a digital 
answering machine application, after the caller has left 

30 a message on the answering machine and the voice da- 
ta has been coded and stored in the storage memory 
1 1 2, the DSP 1 04 performs smoothing operations on the 
parametric data and then rewrites the smoothed para- 
metric data back to the storage memory 112 any time 

35 before the message is listened to. 

As shown in Figure 10, in step 222 the voice coder/ 
decoder 1 02 receives parameters from multiple consec- 
utive frames and stores like parameters from each of 
the plurality of frames in respective circular buffers in 

40 the local memory 106. In other words, the same or like 
parameters from each of the frames are stored in re- 
spective circular buffers. Thus, all of the pitch parame- 
ters for each of the consecutive frames are stored in one 
circular buffer, the voice/unvoice parameters for each of 

45 the consecutive frames are stored in a second circular 
buffer, and so on. In the preferred embodiment, like pa- 
rameters from seventeen frames are preferably stored 
in each circular buffer to allow a parameter to be exam- 
ined in the context of its neighboring parameters from 

50 the eight prior and eight subsequent frames. This allows 
much more accurate smoothing and allows for en- 
hanced speech signal quality while using low bit rate 
coders. 

In an alternate embodiment, a different number of 
55 like parameters are stored in each circular buffer for 
each type of parameter. In other words, the circular buff- 
ers vary in size depending on the parameter type, and 
thus certain parameters use a greater number of like pa- 
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rameters from prior and subsequent frames in the 
smoothing process than do others. In this embodiment, 
the number of like parameters stored in a respective cir- 
cular buffer, i.e., the size of the circular buffer for a re- 
spective parameter, depends on the number of param- s 
eters in prior and subsequent frames required for the 
smoothing process to accurately smooth the particular 
parameter. Thus, if a certain parameter requires analy- 
sis of a greater number of parameters in prior and sub- 
sequent frames for accurate smoothing, such as the 10 
voice/unvoice parameter, a larger circular buffer is used 
for this parameter. 

In step 224 the DSP 104 transforms the received 
parameters in a form more suitable for smoothing. For 
example, if a certain parameter is stored in a difference is 
format where each parameter in a frame is stored as a 
difference value based on the respective parametric val- 
ue and the value of the parameter in the prior frame, this 
step transforms each of the parameters into a normal or 
more intelligible format, where each value represents 20 
the true value of the parameter. In one embodiment the 
DSP 104 further transforms the parametric data into a 
new format using a desired transformation prior to 
smoothing. This is done where the DSP 104 more ac- 
curately smoothes the voice data in this new format. 2s 

In step 226 the DSP 104 performs smoothing for 
each parameter using parameters in the eight prior and 
subsequent frames. The smoothing process includes 
first comparing the respective parameter value with the 
like parameter values from the eight prior and subse- 30 
quent frames to determine if a discontinuity exists. If ex- 
amination of the respective parameter with reference to 
the parameters in the eight prior and subsequent frames 
reveals that a discontinuity exists and that this disconti- 
nuity is likely an error, the smoothing process adjusts 3S 
the parameter value to more closely match neighboring 
values. In one embodiment, the DSP 104 simply replac- 
es this discontinuous value with a neighboring value. 

As noted above, since the smoothing process is 
performed after the encoding operation has completed, 40 
parameters from a much larger numberof prior and sub- 
sequent frames are available for each current parame- 
ter being smoothed. Therefore, if a discontinuity in one 
of the parameters is detected, the smoothing method of 
the present invention examines parameters from a 45 
greater number of prior and subsequent frames to per- 
form enhanced smoothing of the parameters prior to de- 
coding the parameters into speech signal waveforms. 
The ability to examine parameters in a greater number 
of prior and subsequent frames during the smoothing so 
process provides more intelligent and more accurate 
smoothing of the respective parameters and thus pro- 
vides enhanced speech signal quality. 

In one embodiment of the invention, if the DSP 1 04 
decides that an even greater number of parameters ss 
from additional prior and subsequent frames are 
deemed necessary to reach a decision in the smoothing 
process, the DSP 104 reads these additional parame- 
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ters into the local memory 106 in order to perform more 
intelligent smoothing of that respective parameter. 

In step 228 the DSP 104 transforms the smoothed 
parameters back into their original form, i.e., the form 
these parameters had prior to step 224. In step 230 the 
DSP 104 stores the smoothed parametric data back in 
the storage memory 112. In step 232 the DSP 104 de- 
termines if more parameter data remains in the storage 
memory 1 1 2 that has not yet been smoothed. If so, the 
DSP 104 repeats steps 222 - 230 for the next set of pa- 
rameter data. If the smoothing process has been applied 
to all of the parameter data in the storage memory 112, 
then operation completes. 

Smoothing Performed During Decoding 

Referring now to Figure 11 , a flowchart diagram il- 
lustrating the voice decoding process which includes in- 
terf rame smoothing according to one embodiment of the 
present invention is shown. In step 242 the local mem- 
ory 106 receives parameters for multiple frames and 
stores like parameters from each of the plurality of 
frames in respective circular buffers. In other words, as 
described above, all of the pitch parameters for each of 
the frames are stored in one circular buffer, the voice/ 
unvoice parameters for each of the frames are stored in 
a second circular buffer and so on. As mentioned 
above, parameters from seventeen frames are prefera- 
bly stored in each circular buffer to allow the parameters 
from the eight prior and eight subsequent frames to be 
used for the smoothing process for each parameter. 
This allows much more accurate smoothing and allows 
for enhanced speech signal quality according to the 
present invention. 

In step 244 the DSP 104 de-quantizes the data to 
obtain Ipc parameters. For more information on this step 
please see Gersho and Gray, Vector Quantization and 
Signal Compression , Kluwer Academic Publishers, 
which is hereby incorporated by reference in its entirety. 
In step 246 the DSP 1 04 performs smoothing for respec- 
tive parameters in each circular buffer using parameters 
in the eight prior and subsequent frames. As noted 
above, the smoothing process comprises comparing the 
respective parameter value with like parameter values 
from neighboring frames. If a discontinuity exists, and 
the discontinuity is likely an error, the DSP 104 replaces 
the discontinuous parameter with a new value, prefera- 
bly the value of a neighboring parameter. It is noted that 
steps of transforming the parameters into a more desir- 
able form for smoothing and then transforming the 
smoothed parameters back into their original form after 
smoothing may also be performed. These steps would 
be similar to steps 224 and 228 of Figure 1 0. 

As stated above, since the smoothing process is 
performed after the encoding operation has completed, 
parameters from a much larger number of prior and sub- 
sequent frames are available for each current parame- 
ter being smoothed. Therefore, the smoothing method 
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of the present invention examines parameters from a 
greater number of prior and subsequent frames to per- 
form enhanced smoothing of the parameters prior to de- 
coding the parameters into speech signal waveforms. 
The ability to examine parameters in a greater number 
of prior and subsequent frames during the smoothing 
process provides more intelligent and more accurate 
smoothing of the respective parameters and thus pro- 
vides enhanced speech signal quality. 

in one embodiment of the invention, as noted 
above, if the DSP 104 decides that parameters from a 
greater number of prior and subsequent frames are 
deemed necessary to reach a decision in the smoothing 
process, the DSP 104 reads additional parameters into 
the local memory 106 in order to perform more intelligent 
smoothing of that respective parameter. However, it is 
noted that this technique is limited when smoothing is 
being performed in real time during the decode process 
since retrieving additional parameters may impose an 
undesirable delay in generating speech waveforms. 

In step 248 the DSP 104 generates speech signal 
waveforms using the smoothed parameters. The 
speech signal waveforms are generated using a speech 
production model as shown in Figures 4 or 5. For more 
information on this step : please see Rabiner and 
Schafer, Digital Processing of Speech Signals , refer- 
enced above, which is incorporated herein by reference. 
In step 250 the DSP 104 determines if more parameter 
data remains to be decoded in the storage memory 112. 
If so, in step 252 the DSP 104 reads in a new parameter 
value for each circular buffer and returns to step 244. 
These new parameter values replace the least recent 
prior value in the respective circular buffers and thus al- 
lows the next parameter to be examined in the context 
of its neighboring parameters in the eight prior and sub- 
sequent frames. If no more parameter data remains to 
be decoded in the storage memory 1 1 2 in step 250, then 
operation completes. 

In one embodiment of the present invention, during 
the smoothing process performed in either Figure 10 or 
Figure 11, only certain important parameters are main- 
tained in circular buffers in the local memory 106 to re- 
duce local memory requirements while allowing the 
DSP 104 easier access to these parameters. This em- 
bodiment is used when one or more of the parameter 
types are deemed to have greater relative importance 
and/or are more likely to experience severe discontinu- 
ities and hence erroneous parameter estimations than 
other parameters. For those parameters deemed to 
have greater relative importance or which are more like- 
ly to experience errors, a greater number of like param- 
eters in neighboring Irames are used during the smooth- 
ing process. Thus, these parameters are preferably 
maintained in circular buffers in the local memory 106 
for ease of access. Those parameters which are less 
likely to have discontinuities and/or are less important 
require less parameters for smoothing, and these pa- 
rameters are accessed as needed from the random ac- 



cess storage memory 112. In the preferred embodiment, 
the pitch and voicing parameters are maintained in the 
local memory 106 during the smoothing process for 
more efficient smoothing during the decoding process. 

Examples of the Smoothing Process 

When voice coding is being performed on the pitch 
parameter value, the pitch estimation will sometimes er- 
roneously detect two times or one-half times or another 
multiple of the true value of the pitch. However, rarely in 
normal speech will the pitch of the human vocal cords 
change so substantially in 20 ms frames. Since a virtu- 
ally unlimited number of prior and subsequent frames 
are available for smoothing analysis according to the 
present invention, the DSP 104 examines the pitch pa- 
rameter from a plurality of prior and subsequent frames 
in order to perform more enhanced smoothing of the 
pitch parameter. This allows the DSP 104 to more ac- 
curately remove this error from the speech data prior to 
decoding the parameter data into speech waveforms. 

Another parameter generated during the voice cod- 
ing process is a voice/unvoice parameter indicating 
whether the current speech waveform is a voiced signal 
or unvoiced signal. As discussed in the background sec- 
tion, a voiced speech signal involves vibration of the vo- 
cal cords. An example of a voiced sound is "ahhh" where 
the vocal cords vibrate to produce the desired sound. 
An unvoiced signal does not involve vibration of the vo- 
cal cords, but rather involves forcing air out of a con- 
striction in the vocal tract to produce a desired sound. 
An example of an unvoiced sound is "ssss." Here the 
vocal cords do not vibrate, but rather the sound is gen- 
erated by forcing air through a constriction of the vocal 
tract at the mouth. 

Most sounds in the English language are either 
voiced or unvoiced. However, some sounds, referred to 
as voiced fricatives, exhibit qualities of both, i.e., these 
sounds involve both vibration of the vocal cords and 
constriction of the vocal tract near the mouth to reduce 
air flow. An example o1 a speech sound which includes 
both voiced and unvoiced components is "ww," where 
the sound is generated partially from vibration of the vo- 
cal cords and partially by expelling air through a con- 
stricted vocal tract. Sounds which have both voiced and 
unvoiced components require an impulse train genera- 
tor to produce the voice component of the sound as well 
as random noise to produce the unvoiced portion of the 
sound. 

In general, voicing parameter information can be 
represented by one binary value per frame, and it is un- 
desirable to transmit more than one bit per frame indic- 
ative of whether a speech signal is voiced or unvoiced. 
Thus, for a voiced speech signal, the parameter for con- 
secutive 20 ms frames would be voiced, voiced, voiced, 
voiced, voiced, etc. However, when a speech signal is 
being encoded which includes both voiced and un- 
voiced characteristics, the voicing estimation may de- 
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termine that the speech waveform has a 50% voiced 
content. The voice estimator preferably then dithers the 
parameters for consecutive frames to appear as voiced, 
unvoiced, voiced, unvoiced, etc. 

During smoothing of the voicing parameter, the 5 
smoothing process examines a plurality of prior and 
subsequent frames and detects the statistics of the un- 
derlying signal as being a combination of voiced and un- 
voiced sounds. For example, the smoothing process ex- 
amines parameters from a plurality of prior and subse- 10 
quent frames and determines that the current speech 
sound being decoded should comprise 75% unvoiced 
and 25% voiced speech. Alternatively, the smoothing 
process examines the statistics of the voiced/unvoiced 
parameters and detects that the current sounds being is 
decoded should be 50% voiced and 50% unvoiced. 
Thus, in one embodiment the decoding process pro- 
vides enhanced speech signal quality by controlling the 
excitation generator accordingly, i.e., by mixing the im- 
pulse train generator and random noise generator 20 
based on the detected percentages of voiced and un- 
voiced speech. Th us the decoder produces sounds with 
both voiced and unvoiced components much more ac- 
curately. 

In one embodiment the smoothing process exam- 25 
ines parameters from a large number of prior and sub- 
sequent frames to more accurately detect transitions 
between voiced speech, unvoiced speech, and speech 
having components of both voiced and unvoiced 
speech. This information is then used during decoding 30 
to reposition one or more frames to more accurately 
model the speech. For example, when the smoothing 
process detects that the voiced and unvoiced parameter 
statistics transition from 100% voiced to 75%/25% 
voiced/unvoiced to 50% voiced/unvoiced in consecutive 3S 
frames, the process not only detects that speech sounds 
with both voiced and unvoiced components are required 
to be generated, but also more accurately detects the 
transition period between the voiced speech and the 
voiced/unvoiced speech. This information is used dur- 40 
ing the decoding process to generate enhanced and 
more realistic speech waveforms. 

In the method of the present invention, the smooth- 
ing process is performed afterthe encoding process has 
completed and the parametric data has been stored in 45 
the storage memory 112. Where smoothing is per- 
formed on the voicing parameter as described above, 
smoothing is preferably performed during the decoding 
process since representation of a frame as, for example, 
75% voiced 25% voiced, etc., requires more than 1 bit so 
for the frame. 

Therefore, the present invention essentially allows 
a single bit stream with one voiced/unvoiced bit per 
frame to provide an indication of not only whether the 
respective frame is a voiced sound or unvoiced sound, ss 
but rather analyzes the statistics of the voicing param- 
eters in consecutive frames to provide enhanced 
speech quality. By analyzing the statistics of the voiced 



and unvoiced parameters of consecutive frames, the 
method accurately detects whether and by what per- 
centage speech sounds comprise both voiced and un- 
voiced components and also more accurately detects 
the transitions between voiced, unvoiced, and voiced/ 
unvoiced speech signals. It is noted that this is not pos- 
sible in a standard real time environment because the 
decoder cannot analyze a sufficient number of frames 
without inserting an unacceptable delay. 

Memory Storage Configurations 

According to the invention, different parameter stor- 
age and accessing methods may be used to ensure that 
the DSP 104 receives the parameters from the storage 
memory 112 in the order necessary to perform inter- 
frame smoothing. Figure 12 illustrates a configuration of 
the storage memory 112 according to one embodiment 
where the storage memory 1 1 2 is a random access stor- 
age memory, such as dynamic random access memory 
(DRAM). The memory storage configuration in Figure 
12 is referred to as normal ordering, whereby the pa- 
rameters for each frame are stored contiguously in the 
memory sequentially according to the respective frame. 
Thus, for frame n, the parameters P-,(n), P 2 (n), and P 3 
(n), . . . are stored consecutively in the memory. The pa- 
rameters for frame n + 1 referred to as P^ (n + 1), P 2 (n 
+ 1), and P 3 (n + 1) are stored consecutively after the 
parameters for frame n, and so forth. Where the storage 
memory 112 is a random access memory, and the DSP 
104 is coupled to the storage memory 112 via a bus or 
demand serial link, the DSP 104 accesses any desired 
parameters in the storage memory 112. Thus, as shown 
in Figure 12 when interframe smoothing is performed, 
the DSP 104 accesses (ike parameters from a plurality 
of consecutive frames for each respective circular buffer 
as described above. 

Figure 12 presumes that for each parameter a 
smoothing process is applied using parameters in a cer- 
tain number of prior and subsequent frames. It is noted 
that a different number of prior frame parameters and 
subsequent frame parameters may be used in the 
smoothing process as desired. In the following example 
parameters from an equal number of prior and subse- 
quent frames are used. In this example, for parameter 
P^ a smoothing process is applied using parameters in 
a certain number x., of prior and x-, subsequent frames, 
whereas the smoothing process performed on parame- 
ter P2 uses parameters from x 2 prior and x 2 subsequent 
frames and smoothing is applied for parameter P 3 using 
parameters from x 3 prior and x 3 subsequent frames. 
Thus, the circular buffer for parameter P-, is designed to 
store 2x-, + 1 P 1 parameters, the circular buffer for pa- 
rameter P 2 is designed to store 2x 2 + 1 P 2 parameters, 
and the circular buffer for parameter P 3 is designed to 
store 2x 3 + 1 P 3 parameters. It is noted that at the be- 
ginning of the smoothing process when the circular buff- 
ers are initially loaded with parameters, a limited number 
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of prior frames are available, i.e., frames are not avail- 
able at time before zero. Thus, the parameters from 
these "non-existent" frames are set to nominal values. 
This is shown in Figure 12, whereby in the frame prior 
to the current access point, the parameter P 1 (n-1 ) is not 5 
available, whereas parameters P 2 (n) and P 3 (n+1) are 
available. However, after a certain beginning number of 
parameters have been examined, the respective circu- 
lar buffer will contain parameters from prior and subse- 
quent frames. io 

After the circular buffers have been loaded, when 
the circular buffers for each of these parameters require 
a new value, the parameters are accessed from the stor- 
age memory 112. In the example decribed where x 3 is 
one greater than x 2 and x 2 is one greater than a pa- is 
rameter P-j(n) is accessed for the circular buffer corre- 
sponding to parameter P-,, parameter P 2 (n + 1) is ac- 
cessed for the circular buffer corresponding to parame- 
ter P 2 and parameter P 3 (n + 2) is accessed for the cir- 
cular buffer corresponding to parameter P 3 , as shown 20 
in Figure 12. Therefore, the memory storage scheme 
shown in Figure 12 assumes that frames of parameters 
are stored sequentially corresponding to the order in 
which speech data is received, and the DSP 104 ran- 
domly accesses desired parameters to fill the circular 25 
buffers during the smoothing process. 

Referring now to Figure 1 3, a different memory stor- 
age configuration referred to as demand ordering is 
shown. The memory configuration of Figure 13 pre- 
sumes a voice storage and retrieval system where the 30 
parameters in the storage memory 112 cannot be ran- 
domly accessed as in Figure 12. In this embodiment, 
during the encoding process, the parameters generated 
by the DSP 1 04 are not stored consecutively as in Figure 
1 2, but rather are stored based on how these parame- 3$ 
ters are required to be accessed to perform the inter- 
frame smoothing process. Thus, instead of ordering the 
parameters by frame and accessing the parameters 
(n), P 2 (n+1) and P 3 (n+2) from non-consecutive loca- 
tions as shown in Figure 12, the parameters are "de- 40 
mand" ordered whereby the parameters P^n), P 2 (n+1) 
and P 3 (n+2) are stored consecutively in the memory 
112. It is noted that this embodiment requires that the 
local memory 106 queue the parameter values during 
the encoding process, so that the parameters are trans- *s 
f erred to the storage memory 1 1 2 in the necessary order 
to store these parameters as shown in Figure 13. 

In an embodiment where the storage memory 112 
is a random access memory and the DSP 1 04 randomly 
accesses any parameters from the storage memory so 
112, a normal ordering storage method is preferably 
used as shown in Figure 12. In an embodiment where 
a demand serial link is used, such as that shown in Fig- 
ure 7, the normal ordering storage method of Figure 12 
is also preferably used. However, the storage method 55 
of Figure 1 3 may be used in this embodiment as desired. 
Where a dumb serial link 130 is used between the DSP 
104 and the storage memory 112, the storage method 



of Figure 13 is preferably used. 

Referring again to Figure 7, if the serial link 130 is 
a dumb serial link, then during the encoding process of 
Figure 8, the DSP 1 04 stores the parameters in the stor- 
age memory 1 1 2 based on the order that these param- 
eters are required to be accessed by the DSP 1 04 during 
a subsequent smoothing process. As noted above, this 
requires that the local memory 1 06 queue the parameter 
values during the encoding process to enable the DSP 
1 04 to transfer these parameters to the storage memory 
11 2 in the necessary order. Alternatively, the parametric 
data may be stored in a normal ordering fashion as 
shown in Figure 1 2. In this embodiment, as the DSP 1 04 
reads the parameter data during the interframe smooth- 
ing process, this parameter data is queued in the local 
memory 106 and the parameters are then provided to 
the DSP 104 in the desired order for smoothing. There- 
fore, in an embodiment where a dumb serial link 130 is 
used, the voice coder/decoder 1 02 requires a sufficient- 
ly large local memory 106 to queue a potentially large 
number of parameter values regardless of the storage 
method used. 

Conclusion 

Therefore a system and method for storing and gen- 
erating speech signals with enhanced quality using very 
low bit rate coders is shown and described. The system 
and method of the present invention performs a smooth- 
ing process after the parameter encoding has complet- 
ed, where access to parameters in a greater number of 
prior and subsequent frames are available for the 
smoothing process. As noted above, the present inven- 
tion may be applied to other systems that involve the 
storage and retrieval of parametric data, including video 
storage and retrieval systems, among others. The 
present invention may also be applied to real time data 
communication systems which have sufficient system 
bandwidth and processing power to store the parametric 
data and apply smoothing using a plurality of prior and 
subsequent frames during real time transmission. 

Although the method and apparatus of the present 
invention has been described in connection with the pre- 
ferred embodiment, it is not intended to be limited to the 
specific form set forth herein, but on the contrary it is 
intended to cover such alternatives, modifications, and 
equivalents, as can be reasonably included within the 
spirit and scope of the invention as defined by the ap- 
pended claims. 

The present invention therefore provides, accord- 
ing to a first aspect, a method for storage and retrieval 
of digital voice data, comprising the steps of: 

receiving input voice waveforms; 

converting said input voice waveforms into digital 

voice data; 

encoding said digital voice data into a plurality of 
parameters for each of a plurality of frames of said 
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digital voice data; 

storing said plurality of parameters in a storage 
memory; 

reading said plurality of parameters from said stor- 
age memory after said steps of encoding said digital 
voice data and storing said plurality of parameters; 
and 

smoothing said plurality of parameters to remove 
discontinuities from said plurality of parameters af- 
ter said step of reading said plurality of parameters 
from said storage memory. 

The present invention also provides a digital voice 
storage and retrieval system which provides enhanced 
speech quality, comprising: 



discontinuities Irom said plurality of parameters af- 
ter said step of reading said plurality of parameters 
from said storage memory. 

Preferably, said step of smoothing produces a 
smoothed plurality of parameters, the method further 
comprising: 

storing said smoothed plurality of parameters in 
said storage memory after said step of smoothing. 

Preferably, for one or more of said plurality of pa- 
rameters, said step of smoothing comprises: 

comparing a first parameter in a first frame with like 
parameters from a plurality of prior frames and a 
plurality of subsequent frames to determine if said 
first parameter varies substantially from said like 
parameters from said plurality of prior frames and 
said plurality of subsequent frames; and 
replacing said first parameter with a new value if 
said step- of comparing indicates that said first pa- 
rameter varies substantially from said like parame- 
ters from said plurality of prior frames and said plu- 
rality of subsequent frames. 

Preferably, said step of smoothing further compris- 
es: 

reading additional like parameters from said stor- 
age memory after said step of comparing if said step 
of comparing indicates that said first parameter var- 
ies substantially from said like parameters in said 
plurality of prior frames and said plurality of subse- 
quent frames; and 

comparing said first parameter with said additional 
like parameters read in said step of reading said ad- 
ditional parameters to determine if said first param- 
eter varies substantially. 

Preferably, said step of encoding generates a plu- 
rality of parameters of different types for each of said 
plurality of frames; and 

wherein said step of reading said plurality ol param- 
eters from said storage memory includes storing 
ones of said plurality of parameters in a plurality of 
buffers, wherein parameters of the same type from 
a plurality of said plurality of frames are stored in 
each of said plurality of buffers. 

Preferably, said plurality of buffers have differing 
sizes for different types of parameters. 

Preferably, said step of storing said plurality of pa- 
rameters in said plurality of buffers comprises storing a 
first number of parameters of a first type in a first buffer 
and storing a second number of parameters of a second 
type in a second buffer, whereby said first number is dif- 



a processor which receives input voice waveforms 
and generates a plurality of parameters represent- 
ative of said input voice waveforms, wherein said 
input voice waveforms can be partitioned into a plu- 20 
rality of frames and said processor generates said 
plurality of parameters for said plurality of frames of 
said input voice waveforms; 

a memory store coupled to said processor for stor- 
ing said plurality of parameters; 2s 
a local memory coupled to said processor for stor- 
ing a first plurality of said plurality of parameters, 
wherein said first plurality of parameters includes a 
first parameter in a first frame being smoothed and 
like parameters from a plurality of prior and subse- 30 
quent frames relative to said first frame; 
wherein said processor reads said first plurality of 
parameters from said memory store and stores said 
first plurality of parameters in said local memory; 
wherein said processor performs smoothing oper- 35 
ations on said first parameter in said local memory 
after reading said first plurality of parameters from 
said memory store and storing said first plurality of 
parameters in said local memory. 

40 

According to a further aspect, the invention pro- 
vides a method for storage and retrieval of digital para- 
metric data, comprising the steps of: 

receiving input digital data; 45 

encoding said digital data into a plurality of param- 
eters for each of a plurality of frames of said digital 
data; 

so 

storing said plurality of parameters in a storage 
memory; 

reading said plurality of parameters from said stor- 
age memory after said steps of encoding said digital ss 
data and storing said plurality of parameters; and 

smoothing said plurality of parameters to remove 



12 

EP 0731348A2J > 



23 



EP 0 731 348 A2 



24 



ferent than said second number. 

Preferably, said plurality of buffers comprise a plu- 
rality of circular buffers. 

Preferably, said step of encoding generates a plu- 
rality of parameters of different types for each of said s 
plurality of frames; and 

wherein said step of reading said plurality of param- 
eters from said storage memory includes storing 
ones of said plurality of parameters in one or more to 
buffers, wherein parameters of a first type are 
stored in a first buffer and parameters of a second 
type remain in said storage memory and are not 
stored in a buffer; 

wherein said step of smoothing comprises: i& 

comparing a first parameter of said first type in 
said first buffer with other parameters of said 
first type in said first buffer to determine if said 
first parameter varies substantially from said 20 
other parameters in said first buffer; 
replacing said first parameter with a new value 
if said step of comparing indicates that said first 
parameter varies substantially from said other 
parameters in said first buffer; 2s 
reading parameters of said second type from 
said storage memory from a plurality of said 
plurality of frames; 

comparing a first parameter of said parameters 

of said second type with other parameters of 30 

said second type; 

replacing said first parameter of said parame- 
ters of said second type with a new value if said 
step of comparing indicates that said first pa- 
rameter of said parameters of said second type 35 
varies substantially from other parameters of 
said second type. 

Preferably, said step of encoding comprises gener- 
ating a plurality of like parameters for a first type of pa- 40 
rameter in one or more of said plurality of frames, the 
method further comprising: 

performing intraframe smoothing on said plurality of 
like parameters of said first type for each of said one 45 
or more of said plurality of frames, wherein said step 
of performing intraframe smoothing generates a 
single parameter value of said first type based on 
said plurality of parameter values of said first type 
for each of one or more of said plurality of said so 
frames. 

Prelerably, said method further comprises: 

transforming said plurality of parameters from a first 55 
form to a second form more suitable for smoothing, 
wherein said step of transforming is performed after 
said step of reading said plurality of parameters 



from said storage memory and prior to said step of 
smoothing said plurality of parameters; 
transforming said smoothed plurality of parameters 
back to said first form after said step of smoothing 
said plurality of parameters; and 
storing said plurality of parameters in said storage 
memory after said step of transforming said 
smoothed plurality of parameters to said first form. 

Preferably, said input digital data comprises voice 
data; 

Preferably, said input digital data comprises video 
data. 

According to a fourth aspect, the invention provides 
a digital data storage and retrieval system which pro- 
vides enhanced signal quality, comprising: 

a processor which receives input digital data and 
generates a plurality of parameters representative 
of said input digital data, wherein said input digital 
data can be partitioned into a plurality of frames and 
said processor generates said plurality of parame- 
ters for said plurality of frames of said input digital 
data; 

a memory store coupled to said processor for stor- 
ing said plurality of parameters; 
a local memory coupled to said processor for stor- 
ing a first plurality of said plurality of parameters, 
wherein said first plurality of parameters includes a 
first parameter in a first frame being smoothed and 
like parameters from a plurality of prior and subse- 
quent frames relative to said first frame: 
wherein said processor reads said first plurality of 
parameters from said memory store and stores said 
first plurality of parameters in said local memory; 
wherein said processor performs smoothing oper- 
ations on said first parameter in said local memory 
after reading said first plurality of parameters from 
said memory store and storing said first plurality of 
parameters in said local memory. 

Preferably, said processor stores said smoothed 
first plurality of parameters in said storage memory after 
performing said smoothing operations on said first plu- 
rality of parameters in said local memory. 

Preferably, said processor performs smoothing op- 
erations on said first parameter in said local memory us- 
ing said like parameters from said plurality of prior and 
subsequent frames. 

Preferably, said processor comprises: 

means for comparing said first parameter in said 
first frame with said like parameters from said plu- 
rality of prior and subsequent frames to determine 
if said first parameter varies substantially from said 
like parameters from said plurality of prior and sub- 
sequent frames; and 

means for replacing said first parameter with a new 
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value if said means for comparing determines that 
said first parameter varies substantially from said 
like parameters from said plurality of prior and sub- 
sequent frames. 

5 

Preferably, said processor reads additional like pa- 
rameters from said memory store after operation of said 
means for comparing if said means for comparing de- 
termines that said first parameter varies substantially 
from said like parameters in said plurality of prior and 10 
subsequent frames; and 

wherein said means for comparing compares said 
first parameter with said adcitional like parameters 
to determine if said first parameter varies substan- 1$ 
tialty. 

Preferably, said processor generates a plurality of 
parameters of different types for each of said plurality of 
frames of said input digital data. 20 

wherein said local memory includes a plurality of 
buffers corresponding to said parameters of differ- 
ent types; 

wherein said processor reads said parameters from 25 
said memory store and stores said parameters of 
the same type in said buffers in said local memory. 

Preferably, said plurality of buffers have differing 
sizes for different types of parameters. 30 

Preferably, said input digital data comprises voice 
data. 

Preferably, said input digital data comprises video 
data. 



Claims 

1. A method for storage and retrieval of digital voice 
data, comprising the steps of: 40 

receiving input voice waveforms; 
converting said input voice waveforms into dig- 
ital voice data; 

encoding said digital voice data into a plurality 

of parameters for each of a plurality of frames 

of said digital voice data; 

storing said plurality of parameters in a storage 

memory; 

reading said plurality of parameters from said so 
storage memory after said steps of encoding 
said digital voice data and storing said plurality 
of parameters; and 

smoothing said plurality of parameters to re- 
move discontinuities from said plurality of pa- 55 
rameters after said step of reading said plurality 
of parameters from said storage memory. 



2. The method of claim 1 , wherein said step of smooth- 
ing produces a smoothed plurality of parameters, 
the method further comprising; 

generating speech signal waveforms based 
on said smoothed plurality of parameters after said 
step of smoothing. 

3. The method of claim 1 , wherein said step of smooth- 
ing produces a smoothed plurality of parameters, 
the method further comprising: 

storing said smoothed plurality of parameters 
in said storage memory after said step of smooth- 
ing. 

4. The method of claim 3, further comprising: 

reading said smoothed plurality of parameters 
from said storage memory after said step of 
storing said smoothed plurality of parameters; 
and 

generating speech signal waveforms based on 
said smoothed plurality of parameters after said 
step of reading said smoothed plurality of pa- 
rameters from said storage memory. 

5. The method of claim 1 , wherein, for one or more of 
said plurality of parameters, said step of smoothing 
comprises: 

comparing a first parameter in a first frame with 
like parameters from a plurality of prior frames 
and a plurality of subsequent frames to deter- 
mine if said first parameter varies substantially 
from said like parameters from said plurality of 
prior frames and said plurality of subsequent 
frames; and 

replacing said first parameter with a new value 
if said step of comparing indicates that said first 
parameter varies substantially from said like 
parameters from said plurality of prior frames 
and said plurality of subsequent frames. 

6. The method of claim 5, wherein said step of com- 
paring comprises comparing said first parameter in 
said first frame with like parameters from a plurality 
of prior consecutive frames and a plurality of sub- 
sequent consecutive frames. 

7. The method of claim 6, wherein said step of com- 
paring comprises comparing said first parameter in 
said first frame with like parameters from eight prior 
consecutive frames and eight subsequent consec- 
utive frames, 

8. The method of claim 5, wherein said step of smooth- 
ing further comprises: 

reading additional like parameters from said 
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storage memory after said step of comparing if 
said step of comparing indicates that said first 
parameter varies substantially from said like 
parameters in said plurality of prior frames and 
said plurality of subsequent frames; and s 
comparing said first parameter with said addi- 
tional like parameters read in said step of read- 
ing said additional parameters to determine if 
said first parameter varies substantially 

10 

9. The method of claim 1 , wherein said step of encod- 
ing generates a plurality of parameters of different 
types for each of said plurality of frames; and 

wherein said step of reading said plurality of 
parameters from said storage memory includes is 
storing ones of said plurality of parameters in a plu- 
rality of buffers, wherein parameters of the same 
type from a plurality of said plurality of frames are 
stored in each of said plurality of buffers. 

20 

10. The method of claim 9, wherein, for each of said 
buffers, said step of smoothing comprises: 

comparing a first parameter in a first buffer with 
other parameters in said first buffer to deter- 25 
mine if said first parameter varies substantially 
from said other parameters in said first buffer; 
and 

replacing said first parameter with a new value 
if said step of comparing indicates that said first 30 
parameter varies substantially from said other 
parameters in said first buffer. 

11. The method of claim 9, wherein said plurality of buff- 
ers have differing sizes for different types of pa ram- 3s 
eters. 

12. The method of claim 11 , wherein -said step of stor- 
ing said plurality of parameters in said plurality of 
buffers comprises storing a first number of param- 40 
eters of a first type in a first buffer and storing a sec- 
ond number of parameters of a second type in a 
second buffer, whereby said first number is different 
than said second number. 

45 

1 3. The method of claim 9, wherein said plurality of buff- 
ers comprise a plurality of circular buffers. 

14. The method of claim 1 , wherein said step of encod- 
ing generates a plurality of parameters of different so 
types for each of said plurality of frames; and 

wherein said step of reading said plurality of pa- 
rameters from said storage memory includes 
storing ones of said plurality of parameters in ss 
one or more buffers, wherein parameters of a 
first type are stored in a first buffer and param- 
eters of a second type remain in said storage 



memory and are not stored in a buffer; 
wherein said step of smoothing comprises: 

comparing a first parameter of said first 
type in said first buffer with other parame- 
ters of said first type in said first buffer to 
determine if said first parameter varies 
substantially from said other parameters in 
said first buffer; 

replacing said first parameter with a new 
value if said step of comparing indicates 
that said first parameter varies substantial- 
ly from said other parameters in said first 
buffer; 

reading parameters of said second type 
from said storage memory from a plurality 
of said plurality of frames; 
comparing a first parameter of said param- 
eters of said second type with other param- 
eters of said second type; 
replacing said first parameter of said pa- 
rameters of said second type with a new 
value if said step of comparing indicates 
that said first parameter of said parameters 
of said second type varies substantially 
from other parameters of said second type. 

15. The method of claim 1 , wherein said step of encod- 
ing comprises generating a plurality of like param- 
eters for a first type of parameter in one or more of 
said plurality of frames, the method further compris- 
ing: 

performing intraframe smoothing on said plu- 
rality of like parameters of said first type for each of 
said one or more of said plurality of frames, wherein 
said step of performing intraframe smoothing gen- 
erates a single parameter value of said first type 
based on said plurality of parameter values of said 
first type for each of one or more of said plurality of 
said frames. 

16. The method of claim 1, further comprising: 

transforming said plurality of parameters from 
a first form to a second form more suitable for 
smoothing, wherein said step of transforming is 
performed after said step of reading said plu- 
rality of parameters from said storage memory 
and prior to said step of smoothing said plurality 
of parameters; 

transforming said smoothed plurality of param- 
eters back to said first form after said step of 
smoothing said plurality of parameters; and 
storing said plurality of parameters in said stor- 
age memory after said step of transforming said 
smoothed plurality of parameters to said first 
form. 
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17. The method of claim 1, further comprising storing 
said digital voice data in a memory prior to said step 
of encoding, wherein said digital voice data can be 
partitioned into a plurality of frames of digital voice 
data. 5 

18. A digital voice storage and retrieval system which 
provides enhanced speech quality comprising: 



22. The digital voice storage and retrieval system of 
claim 18, wherein said processor performs smooth- 
ing operations on said first parameter in said local 
memory using said like parameters from said plu- 
rality of prior and subsequent frames. 

23. The digital voice storage and retrieval system of 
claim 22, wherein said processor comprises: 



a processor which receives input voice wave- to 
forms and generates a plurality of parameters 
representative of said input voice waveforms, 
wherein said input voice waveforms can be par- 
titioned into a plurality of frames and said proc- 
essor generates said plurality of parameters for is 
said plurality of frames of said input voice wave- 
forms; 

a memory store coupled to said processor for 
storing said plurality of parameters; 
a local memory coupled to said processor for 20 
storing a first plurality of said plurality of param- 
eters, wherein said first plurality of parameters 
includes a first parameter in a first frame being 
smoothed and like parameters from a plurality 
of prior and subsequent frames relative to said 2s 
first frame; 

wherein said processor reads said first plurality 
of parameters from said memory store and 
stores said first plurality of parameters in said 
local memory; 30 
wherein said processor performs smoothing 
operations on said first parameter in said local 
memory after reading said first plurality of pa- 
rameters from said memory store and storing 
said first plurality of parameters in said local 35 
memory. 

19. The digital voice storage and retrieval system of 
claim 18, wherein said processor generates speech 
signal waveforms based on said first plurality of pa- 40 
rameters after performing smoothing operations on 
said first plurality of parameters in said local mem- 
ory. 

20. The digital voice storage and retrieval system of 45 
claim 18, wherein said processor stores said 
smoothed first plurality of parameters in said stor- 
age memory after performing said smoothing oper- 
ations on said first plurality of parameters in said 
local memory. so 

21. The digital voice storage and retrieval system of 
claim 20, wherein said processor generates speech 
signal waveforms based on said first plurality of pa- 
rameters after performing smoothing operations on ss 
said first plurality of parameters in said local mem- 
ory and after said processor stores said smoothed 
first plurality of parameters in said storage memory. 



means for comparing said first parameter in 
said first frame with said like parameters from 
said plurality of prior and subsequent frames to 
determine if said first parameter varies sub- 
stantially from said like parameters from said 
plurality of prior and subsequent frames; and 
means for replacing said first parameter with a 
new value if said means for comparing deter- 
mines that said first parameter varies substan- 
tially from said like parameters from said plu- 
rality of prior and subsequent frames. 

24. The digital voice storage and retrieval system of 
claim 23, wherein said processor reads additional 
like parameters from said memory store after oper- 
ation of said means for comparing if said means for 
comparing determines that said first parameter var- 
ies substantially from said like parameters in said 
plurality of prior and subsequent frames; and 

wherein said means for comparing compares 
said first parameter with said additional like param- 
eters to determine if said first parameter varies sub- 
stantially. 

25. The digital voice storage and retrieval system of 
claim 18, wherein said processor generates a plu- 
rality of parameters of different types for each of 
said plurality of frames of said voice input wave- 
forms; 

wherein said local memory includes a plurality 
of buffers corresponding to said parameters of 
different types; 

wherein said processor reads said parameters 
from said memory store and stores said param- 
eters of the same type in said buffers in said 
local memory. 

26. The digital voice storage and retrieval system of 
claim 25, wherein said plurality of buffers have dif- 
fering sizes for different types of parameters. 

27. A method for storage and retrieval of digital para- 
metric data, comprising the steps of: 

receiving input digital data; 
encoding said digital data into a plurality of pa- 
rameters for each of a plurality of frames of said 
digital data; 
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storing said plurality of parameters in a storage 
memory; 

reading said plurality of parameters from said 
storage memory after said steps of encoding 
said digital data and storing said plurality of pa- 5 
rameters; and 

smoothing said plurality of parameters to re- 
move discontinuities from said plurality of pa- 
rameters after said step of reading said plurality 
of parameters from said storage memory. io 

28. A digital data storage and retrieval system which 
provides enhanced signal quality, comprising: 

a processor which receives input digital data is 
and generates a plurality of parameters repre- 
sentative of said input digital data, wherein said 
input digital data can be partitioned into a plu- 
rality of frames and said processor generates 
said plurality of parameters for said plurality of 20 
frames of said input digital data; 
a memory store coupled to said processor for 
storing said plurality of parameters; 
a local memory coupled to said processor for 
storing a first plurality of said plurality of param- 25 
eters, wherein said first plurality of parameters 
includes a first parameter in a first frame being 
smoothed and like parameters from a plurality 
of prior and subsequent frames relative to said 
first frame; 30 
wherein said processor reads said first plurality 
of parameters from said memory store and 
stores said first plurality of parameters in said 
local memory; 

wherein said processor performs smoothing 35 
operations on said first parameter in said local 
memory after reading said first plurality of pa- 
rameters from said memory store and storing 
said first plurality of parameters in said local 
memory. 40 



45 



SO 



55 



_0731348A2_i_> 



17 



EP 0 731 348 A2 





REPRESENTATION 
OF 

SPEECH SIGNALS 









PARAMETRIC 
REPRESENTATIONS 











EXCITATION 
PARAMETERS 




VOCAL TRACT 
PARAMETERS 



(PRIOR ART) 



DATA RATE (B 

—I 1 1 1 


ITS PER SECOND) 

1 1 1 1 


200000 60000 20 000 


10 000 500 75 


1 1 , 
LDM,PCM,DPCM,AOM 


I I 

ANALYSIS- SYNDESIS 
SYNTHESIS FROMPRINTED 
METHODS TEXT 


(NO SOURCE CODING) 


(SOURCE CODING) 


WAVEFORM 
REPRESENTATIONS 


PARAMETRIC 
REPRESENTATIONS 



RANGE OF BIT RATES FOR VARIOUS TYPES OF SPEECH REPRESENTATIONS. 




(PRIOR ART) 



EXCITATION 
GENERATOR 




TIME-VARYING 
UNEAR 
SYSTEM 


SPEECH 




OUTPUT 



SOURCE-SYSTEM MODEL OF SPEECH PRODUCTION. 




(PRIOR ART) 



WAVEFORM 
REPRESENTATIONS 



ISDOCID: <EP 0731348A2_I_> 



18 



EP 0 731 348 A2 



IMPULSE 
TRAIN 
GENERATOR 




GLOTTAL 
PULSE 
MODEL 

G(Z) 





lA v 

CFn/ . 6 



VOCAL TRACT 
PARAMETERS 



VOICED/ 
UNVOICED 
SWITCH 




"r,(n) 



PITCH 
PERIOD 



IMPULSE 
TRAIN 
GENERATOR 



RANDOM 
NOISE 
GENERATOR 



VOCAL 
TRACT MODEL 
V(Z) 



(PRIOR ART) 



VOICED/ 
UNVOICED 
SWITCH 





RADIATION 




MODEL 




R(2) 



P L (n) 



VOCAL TRACT 
PARAMETERS 



£1 





TIME-VARYING 




) — - 


DIGITAL FILTER 


s(n) ' 



(PRIOR ART) 



0731348A2 I > 



19 



EP 0 731 348 A2 



102 



VOICE 
INPUT/ 
OUTPUT 



DSP 




LOCAL 
MEMORY 





106 

VOICE 
CODER/DECODER 



PARAMETER 
STORAGE 
MEMORY 



112 

V 



120 



CPU 



102 

J- 



DSP 




LOCAL 
MEMORY 


J 



104 



v.. 



106 



VOICE 
CODER/DECODER 



130 



SERIAL 
LINK 



120^ 



CPU 



112 



PARAMETER 
STORAGE 
MEMORY 



20 

„0731348A2_L> 



EP 0 731 348 A2 




RECEIVE INPUT 
WAVEFORMS; 
ENCODE VOICE DATA 




RECEIVE VOICE 
INPUT WAVEFORMS 



202 

u 



SAMPLE AND QUANTIZE 
INPUT WAVEFORM TO 
PRODUCE DIGITAL 
VOICE DATA 



204 

u 



206 



STORE DIGITAL 
VOICE DATA IN 
LOCAL MEMORY 



208 



PERFORM LINEAR PREDICTIVE 
CODING ON A GROUPING OF 
FRAMES AND SUBFRAMES 
WITHIN FRAMES TO GENERATE 
PARAMETERS 



J 



PERFORM INTRAFRAME 

SMOOTHING ON 
SELECTED PARAMETERS 



210 



STORE PACKET OF 
PARAMETERS FOR 
FRAME GROUPING 



212 



YES 




4SDOCID: <EP 073134QA2_L> 



21 



EP 0 731 348 A2 



A/ 




20 ms 



20 ms 



22 



SDOCID: <EP 0731348A2_I_> 



EP 0 731 348 A2 




PERFORM INTER FRAME 
SMOOTHING IN BACKGROUND 




RECEIVE PARAMETER FROM 
STORAGE MEMORY AND 
STORE IN LOCAL MEMORY 



222 



TRANSFORM 
PARAMETERS INTO A 
FORM FOR SMOOTHING 



224 



PERFORM SMOOTHING FOR 
EACH PARAMETER USING 
PARAMETERS FROM PRIOR 
AND SUBSEQUENT FRAMES 



226 



TRANSFORM SMOOTHED 
PARAMETERS BACK INTO 
ORIGINAL FORM 



228 



STORE SMOOTHED 
PARAMETERS IN 
STORAGE MEMORY 



230 



232 



vcc , MORE 
YES -"PARAMETERS TO 

JE SMOOTHED, 
? 



23 



6 



EP 0 731 348 A2 




DECODE VOICE DATA; 
PERFORM INTERFRAME 
SMOOTHING 




RECEIVE PARAMETERS 



FOR MULTIPLE FRAMES J 
AND STORE IN A ^ 
CIRCULAR BUFFER 



242 



DE-QUANTIZE 

DATA TO 
OBTAIN LPC 
PARAMETERS 



244 



PERFORM SMOOTHING 
FOR EACH PARAMETER 
USING PARAMETERS FROM 
PRIOR AND SUBSEQUENT 
FRAMES 



246 



GENERATE SPEECH 
SIGNAL WAVEFORMS 
USING PARAMETERS 



248 



READ IN NEW 
PARAMETER DATA 
FOR EACH CIRCULAR 
BUFFER 



YES 



252 




F8g) Q UU 



24 

iSDOCID: <EP 0731348A2_I_> 



EP 0 731 348 A2 



ACCESS 1 



P 3 (n) 



> FRAME n 



ACCESS 2 



Pi(n+P 
P 2 (n+1) 
P 3 (n+1) 



< 



> FRAME n+1 



ACCESS 3 



P2(n+2) 
P 3 (n+2) 



> FRAME n+2 



H. 



25 



EP 0 731 348 A2 



NOT AVAILABLE 

P2(n) 

P3(IH1) 



CURRENT 
ACCESS 
PAINT 




LOCAL 
FRAME 
BUFFER 



PiM. 



P 2 (n+1) 



Pa(n+2) 



n 



n+1 



Ml* H 



3DOCID: <EP 0731348A2_I_> 



26 



(19) 



J 



Europaisches Patentamt 
European Patent Office 
Office europeen des brevets 



1 



(12) 



(88) Date of publication A3: 

01.04.1998 Bulletin 1998/14 

(43) Date of publication A2: 

11.09.1996 Bulletin 1996/37 

(21) Application number: 96301574.8 

(22) Date of filing: 07.03.1996 



(n) EP 0 731 348 A3 

EUROPEAN PATENT APPLICATION 

(51) Intel 6 : G01L 5/00, G10L 5/00 



(84) Designated Contracting States: 


• Ireton, Mark 


AT BE DE DK ES Fl FR GB GR IE IT LU NL PT SE 


Austin, Texas 78739 (US) 


(30) Priority: 07.03.1995 US 399497 


(74) Representative: BROOKES & MARTIN 




High Holborn House 


(71) Applicant: ADVANCED MICRO DEVICES INC. 


52/54 High Holborn 


Sunnyvale, California 94088-3453 (US) 


London, WC1 V 6SE (GB) 


(72) Inventors: 




• Asghar, Saf 




Austin, Texas 78750 (US) 





CO 

< 

00 
CO 

o 

CL 
LU 



(54) Voice storage and retrieval system 

(57) A digital voice data storage and retrieval sys- 
tem using a low bit rate encoder which provides en- 
hanced speech signal quality while also reducing mem- 
ory size requirements. The system comprises a voice 
coder/decoder which preferably includes a digital signal 
processor (DSP) and also preferably includes a local 
memory. During encoding of the voice data, the voice 
coder/decoder receives voice input waveforms and gen- 
erates a parametric representation of the voice data. A 
storage memory is coupled to the voice coder/decoder 
for storing the parametric data. During decoding of the 
voice data, the voice coder/decoder receives the para- 
metric data from the storage memory and reproduces 
the voice waveforms. According to the invention, an in- 
terframe smoothing method is performed on the para- 
metric data after encoding of all of the speech data has 
completed and the parametric data has been stored in 
the storage memory. The interframe smoothing is per- 
formed either in the background after the coding proc- 
ess has completed or in real time during the decoding 
process immediately prior to converting the parametric 
data back to signal waveforms. Since all of the voice 
input data has already been converted to parametric da- 
ta and stored in memory, parametric data from a virtually 
unlimited number of prior and successive frames is 
available for use by the smoothing algorithm. Therefore, 
the present invention provides more accurate smooth- 
ing and provides enhanced speech signal quality over 
prior systems. 
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